Women in Data Science Blog Post

Blog Post for CS0451
Author

Otis Milliken

Published

March 4, 2024

Women’s in Data Science Blog Post

Abstract

This blog posts discussed the importance of women in data science through the lens of the book Solving the Equation: The Variables for Women’s Success in Engineering and Computing written by Dr.Corbett and Dr.Hill and through four different talks by Dr. Amy Yuen, Dr. Jessica L’Roe, Dr. Sarah Brown, and Dr. Biester. Each talk tackles a different problem or domain in data science and highlights the work being done by women in the field. My biggest takeways was that lots of women are doing incredibly interesting work in the data science domain despite being a minority in the field

Short Essay

Next I’ve highlighted some ideas from the Book, Solving the Equation: The Variables for Women’s Success in Engineering and Computing written by Dr.Corbett and Dr.Hill, in a short essay format that I think are especially importantant

Since the 1990s we’ve seen a massive underrepresentation of women in the computing industry despite a rapid rise in women college enrollment. This underrepresentation in computing, math, and engineering is a major problem because these industries have been major sources of high salaries and good working conditions. Leaving women out of these industries mean that they miss out on high income jobs and therefore wealth accumulation. The problem doesn’t only impact women but also also hurt men. One finding from India found that male leaders held less implicit bias when they had female leadership alongside them (Beaman et al., 2009). Inclusion doesn’t breed animosity but actually creates more equitable working conditions. Keeping women out of these industries also potentially misses out on valuable insights and innovations. It is important to note that exclusion of women in the computing industry hasn’t always the case.

During World War 2, we actually saw a majority of women in the computing industry. This declined drastically after the war but in 1950s and 1960s the number of women in computing was similar to what it is today (around 26%). The percentaged reached a peak in the 1990s with around 35% of the computing workforce being women. Computing during the 1950s and 1960s was largely seen as an administrative extension and thereby women were encouraged to participate in large numbers. The decline after the 1990s is often attributed to the rise of personal computers which were seen as new toys for boys and the growing interest in the area. This early experience gave them an advantage as the computing market expanded. There has been, however, significant initiatives to raise the number of women in computing in recent years from major companies.

There are other ways to help mitigate the issue. One of which is having conferences like the WiDs conference. Having spotlight on women in the industry is an important step to reducing stigma and feelings of isolation while also giving role models for women in computer science. A third of women in private-sector technical jobs feel extremeley isolated at work and four out of ten female engineers reported a lack of role models. The book mentions how seeing other women in leadership roles can actually help women from the harmful effect of stereotyping effect (Van Loo & Rydell, 2014). These effects have a cascade effect as increase in female representation leads to more representation which further reduces feelings of isolation and increases representation. These talks also begin the process of building social networks which has predominantly been dominated by men. The book highlights how research has identified that these male-dominated social networks exclude women (Faulkner, 2009a). Having women talk about their professional experiences and connect with students begins the process of opening up the network to women.

Overall, it takes all of us to help fix the issue of lack of women in computing and conferences like WiDs help showcase the amazing work that women are doing in the industry. I’d like to emphasize that isn’t a women issue but instead that underrepresentation of women in the industry hurts us all. Therefore it is up to both women and men to help mitigate the issue.

For this blog, I also attended a women in Data Science conference hosted by Middlebury. The conference included three lightning talks, one keynote speaker, and three alumni in the industry.

First Lightning Talk Dr. Amy Yuen

Dr. Yuen’s talk discussed whether the United Nation Security Council is truly a Democratic Institution. On the onset, the voting rules seems to inherently favor the powerful since certain members are given veto powers. Why do other countries then spend significant resources and money on running for seats on the Security Council even if they have significantly less power? The talk analysis two different ways to look at democratic institutions: Institutional Rules which contains the voting inside the council and serving on the council; and Representation which looks at which issues are discussed and what the participation level of the non-permanent members are. The overall findings were that while the security council institutional rules aren’t very democratic, it is fairly representative and inclusive. From the talk, I learned about how one can use data for issues that are not necessarily data oriented. I wouldn’t at first think to use data analysis to come to a conclusion about the level of democracy of an institution since it feels more vague than other issues but now I do think it can be very useful to back up ones claims.

Second Lightning Talk Dr. Jessica L’Roe

Dr. L’Roe’s talk began with a note about how in college she only had one woman professor and that growing up in North Carolina, she was no stranger to having to constantly the fight gender expectations. The content of her talk was more a cursery overview of some of the projects that she has worked on with a more in-depth conservation about a specific project in Uganda. The first project looked at land registration in Brazil and her worked tried to encourage registrations so that people could more accurately be help culpable for deforestation. The second project looked at formalizing gold mining in Peru. Most of talk discussed a project dealing with landscape changes around Kibale National park in Uganda. They had an usual situation in which someone was planting lots of possibly invasive trees. Her work showed that many locals had given up their land to non-local people (sometimes to invest in other things like education) and this was driving the tree plantation. Dr.L’Roe’s talk ended saying that women are no longer a rarity in these field and that anyone should pursue the things they are interested in.

Third Lightning Talk Professor Dr.Laura Biester

Dr. Biester’s talk was about computational linguistic models of mental health. This isn’t a new field either. Computers have been used since the 1980s to analyze language for mental health classifications. For example, one indicator is that first person pronouns are used more frequently by depressed people. However, there are many challenges to natural language processing. One of the main issues is access to high quality, complete, and objective data sets. Oftentimes if you rely one self-reporting, then the data doesn’t accurately encompass the general population. Dr. Biester has used Reddit posts to collect data which allows mass collection. However, to do this, you also need to collect posts from ‘control’ users. She also used a second university data set. This methodology proved effective as her model did well compared to other larger models. Overall, Dr. Biesters work showcased the importance of data collection. Data collection is almost as important as building the models themselves.

Keynote Dr.Sarah Brown

Dr.Sarah Brown key note talk tackles some essential questions in machine learning and data science. Some of these key questions were: what is data science? How can we leverage domain knowledge to make better models? How can we make algorithms less biased? She touches on all of these ideas throughout her talk. She begins by discussing how data science is the intersections of Computer science, stats and domain knowledge. Domain knowledge, however, isn’t a concrete subject but can come from anywhere at any time. For example, for Dr.Brown she learned how to use context in Social Studies which she uses to understand data from data science. She uses an example from her work with PTSD patients to exemplify this lesson.

Dr.Brown then discusses fairness in models and her work in the area. One problem with models is that they are often made by computer scientists with a computer science viewpoint. Opening up the process to different fields can help reduce problems and bias because they can have insights that computer scientists can’t see. Dr.Brown also emphasizes that we should do fairness checks before even fitting the model. It is necessary to study the problem to see if a model can be used and if so, what are the possible biases that we could be receiving from the data. She discusses how experts in the field have various different opinions on whether fair algorithms are possible. The three main ones are: If you accept to lose accuracy, you can improve fairness, it isn’t possible and shouldn’t be tried, still progress to made on improving algorithms. Lastly, Dr.Brown discusses how because laws are slow to change to new technologies, we need a new methodology that encourages fairness.

My main takeaway from this talk was about the importance of a set workflow for data scientists and including fairness before model fitting. If that becomes a standard in the industry, we could hopefully reduce algorithmic bias before it gets implemented. I also found it interesting how she was able to construct other algorithms to test if an algorithm would be biased or not.

Discussion

As a man in computer science, I think it’s important for me to consider the inclusivity of the workplaces that I hope to join. It is partially priviledge that I don’t have to consider my own gender in the context of my career very often. This mostly comes from the unnerving truth that as a man, I will almost always be in the majority in this industry. Some of the facts in the book showed that the industry is way more male heavy than I believed and opened up my opinions of the industry. As someone who hopes to move up and eventually become a team leader, I need to be aware of how demographics impact peoples sense of belonging and inclusion. I also just found the talks interesting to listen to and contained some novel ideas about bias, data collection, and fairness that I hadn’t considered.